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ABSTRACT 

Educational  Datamining  (EDM)  is  an  emerging  dicipline,  concerned  with  developing  methods  for  exploring  the 
unique  types  of  data  that  come  from  educational  system  and  using  those  methods  to  better  understand  students, 
and  the  system  which  they  learn  in. 

This  paper  is  designed  to  justify  the  capabilities  of  datamining  techniques  in  context  of  higher  education  by 
offering  a  datamining  model  for  higher  education  system  in  technical  institution.  In  this,  we  are  proposing  a  detection 
techniques  for  detecting  abnormal  values  in  the  student's  result  sheets.  For  this  we  are  applying  dataminig  techniques  like 
classification,  decision  tree  etc.  on  the  huge  educational  data,  for  finding  errors  in  the  sheets  with  respect  to  score  or  grade 
or  any  calculation  mistakes. 

KEYWORDS:  Decision  Tree,  Educational  Data  Mining  (EDM),  Classification,  WEKA 
INTRODUCTION 

Data  mining  is  the  process  of  autonomously  extracting  useful  information  or  knowledge  from  large  datasets. 
It  involves  the  use  of  complicated  data  analysis  tools  to  discover  previously  unknown,  valid  patterns  and  relationships  in 
large  data  sets.  Data  mining  is  a  step  of  KDD  Process.  Knowledge  Discovery  in  Databases  (KDD)  is  the  process  of 
extracting  models  and  patterns  from  large  databases.  Data  Mining  refers  to  the  process  of  applying  the  discovery  algorithm 
to  the  data.  This  research  has  important  contribution.  Our  results  provide  insight  into  the  entire  process  of  applying  data 
mining  tools  to  real-world  data  sets.  In  the  following  section  we  describe  the  overall  methodology  of  the  research, 
from  selection  of  a  data  mining  algorithm  to  create  a  modeling  of  the  academic  performance  prediction  problem  for 
technical  education  students.  Next,  we  give  the  brief  description  of  decision  tree  and  Data  mining  tools  WEKA. 
Finally,  we  discuss  the  practical  importance  of  this  research  and  our  conclusions. 

The  various  techniques  of  data  mining  like  classification,  clustering  and  rule  mining  can  be  applied  to  bring  out 
various  hidden  knowledge  from  the  educational  data.  Prediction  can  be  classified  into:  Classification,  regression  and 
density  estimation.  In  classification,  the  predicted  variable  is  a  binary  or  categorical  variable. 

Some  popular  classification  methods  include  decision  trees,  logistic  regression  and  support  vector  machines. 
In  regression,  the  predicted  variable  is  a  continuous  variable.  Some  popular  regression  methods  within  educational 
data  mining  include  linear  regression,  neural  networks  and  support  vector  machine  regression.  Classification  techniques 
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like  decision  trees,  Bayesian  networks  can  be  used  to  predict  the  student's  behavior  in  an  educational  environment, 
his  interest  towards  a  subject  or  his  outcome  in  the  examination. 

Decision  Tree 

The  concept  of  decision  trees  was  developed  and  refined  over  many  years  by  (Han,  J.,  &  Kamber,  M.  2006) 
starting  with  (Rud,  2001).  A  Decision  tree  is  a  classification  schemes  which  generate  a  tree  and  a  set  of  rules,  representing 
the  model  of  different  classes  from  a  given  dataset.  As  per  Han  and  Kamber  (2000)  Decision  tree  is  a  flow  chart  like 
structure,  where  each  internal  node  denotes  a  test  on  the  an  attribute,  each  branch  represents  an  outcome  of  the  test  and 
leaf  nodes  represent  the  classes  or  class  distributions  We  have  used  J48  in  WEKA  to  do  the  prediction  analysis. 
Decision  trees  are  generated  from  the  training  data  in  a  top-down  direction.  The  root  node  of  a  decision  tree  is  the  trees 
initial  state-the  first  decision  node.  Each  node  in  a  tree  contains  some  data.  On  a  basis  of  an  algorithm  some  calculations 
are  completed  and  the  decision  tree  node  is  been  split  into  two  or  more  branches.  In  some  cases,  the  node  cannot  be  split, 
in  this  case  it  will  be  the  final  decision  node. 

METHODOLOGY 

This  section  describes  the  process  we  followed  to  collect  and  analyze  the  academic  performance.  We  discuss  our 
selection  of  a  data-mining  tool,  followed  by  the  difficult  task  of  preparing  the  data  for  analysis.  We  present  our  model  of 
the  academic  performance  prediction  problem. 

Source  of  Database  and  Description 

Database  has  collected  by  filling  the  questionnaires  by  concerning  student  or  teacher  or  student  parent.  The  survey 
was  designed  to  gather  information  pertaining  to  the  perceived  educational  status  of  parents  and  demographic  information 
of  student  such  as  name,  address,  age,  sex,  education.  The  survey  consisted  of  26  questions.  Some  questions  were  to  be 
answered  yes  or  no,  but  generally  respondents  were  provided  with  more  options  to  answer  the  questions.  The  data  was 
originally  represented  in  excel  data  format  in  the  form  of  two  dimensional  table  consisting  of  373  instances  with  each  data 
point  corresponding  to  the  responses  of  an  individual's,  the  dataset  was  converted  into  Attribute  Relation  File  Format 
(ARFF)  for  effective  and  efficient  usage  WEKA  system.  Table  1  shows  the  description  of  each  attributes  of  database. 


Table  1:  Description  of  Datasets 


S.  No. 

Attribute  Name 

Description 

1 

College_Code 

College  code 

2 

Name_Place 

Place  of  college 

3 

Name_Block 

Name  of  block 

4 

City 

Khandwal  (M.P.) 

5 

Scholer_Number 

Student  scholar  number 

6 

Name_Student 

Name  of  student 

7 

S  tudent_Father_N  ame 

Student  father  name 

8 

Student_Mother_Name 

Mother's  name 

9 

Age 

Age  of  student  (06-10  years) 

10 

Sex 

Gender  (M,  F) 

11 

Class 

(in,  iv,  v) 

12 

Category 

Category  (SC,  ST,  Gen,  OBC) 

13 

College_Type 

(Govt.,  Private) 

14 

Location_College 

Rural  of  urban 

15 

No_Faculty 

Number  of  faculty's  in  college 
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16 

Family_Size 

Number  of  members  of  in  astudent  family 

17 

Living_Zone 

Residential  area  of  student 

18 

Father_Edu 

Father's  Education 

19 

Father_Occup 

Occupation  of  student  father 

20 

Mother_Edu 

Mother;  Education 

21 

Mother_Occu 

Occupation  of  Student's  Mother 

22 

Family_Income 

Family  income 

23 

Private_Tuision 

Are  student  take  private  tusion? 

24 

Attendence_College 

Attendence  of  student's  in  a  class 

25 

Previous_Result 

Previous  year  result  of  student  in  Percentage 

26 

Grade_Previous_Result 

Previous  year  result  of  student 

The  information  gain  with  respect  to  a  set  of  examples  is  the  expected  reduction  in  entropy  that  results  from 
splitting  a  set  of  examples  using  the  values  of  that  attribute.  This  measure  is  used  in  Decision  Tree  induction  and  is  useful 
for  identifying  those  attributes  that  have  the  greatest  influence  on  classification.  The  aim  of  data  preprocessing  is  to 
improve  the  quality  of  the  data  which  will  help  in  improving  "the  accuracy  and  efficiency  of  the  subsequent  mining 
process"  (Han  and  Kamber  2007).  Often,  outliers  decrease  the  accuracy  and  efficiency  of  the  models.  Data  preprocessing 
allows  transforming  the  original  data  into  a  suitable  shape  to  be  used  by  a  particular  mining  algorithm.  So,  before  applying 
the  data  mining  algorithm,  a  number  of  general  data  preprocessing  tasks  have  to  be  addressed  (V.  Ramesh,  at  all  2011,). 
Normally  in  data  mining  process  preprocessing  is  one  of  the  important  stages  where  relevant  data's  are  grouped  and 
cleaned,  this  can  be  done  with  any  of  the  classification  algorithms  and  in  this  study  we  take  J48  classifier  with  the  help  of 
WEKA  software. 

Preparing  the  Data  and  Selecting  the  Relevant  Attribute 

In  the  data  preparation  phase  we  selected  the  relevant  attributes  from  the  available  data,  created  meaningful 
groups  within  the  attributes  and  derived  new  attributes  from  our  knowledge  of  the  domain. 


*  Wekd  Explorer 


Prepraess  Classify  C&jster  Associate  Select  attributes  dualize 


Open  file... 


Open  URL... 
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Figure  1:  View  of  Class  Attribute 
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Building  the  Classification  Model 

The  next  step  is  to  build  the  classification  model  using  the  decision  tree  method.  The  decision  tree  is  a  very  good 
and  practical  method  since  it  is  relatively  fast  and  can  be  easily  converted  to  simple  classification  rules.  The  decision  tree 
method  depends  mainly  on  using  the  information  gain  metric  which  determines  the  attribute  that  is  most  useful. 
The  information  gain  depends  on  the  entropy  measure. 

Experimental  Setup 

This  section  present  the  class  attributes  details  and  which  parameters  have  taken  in  during  creating  a  decision  tree 
model.  Class  attribute  consist  four  classes  as  shown  in  Figure  1  and  parameter  setting  is  shown  in  Figure  2. 

===  Run  information  === 

Scheme:  weka.classifiers.trees.J48  -R  -N  3  -Q  1  -B  -M  2 
Relation:  edudata- 

weka.filters.  unsupervised,  attribute.  Remove-Rl -3,5-8- 
weka.filters.  unsupervised,  attribute.  Remove-Rl,  15- 
weka.filters.  unsupervised,  attribute.  Remo  ve-R3- 
weka.filters.  unsupervised,  attribute.  Remove-Rl,  7- 
weka.filters. unsupervised. attribute. Remo  ve-R5 
Instances:  373 
Attributes:  13 

Test  Mode:  Evaluate  on  training  data 
===  Classifier  model  (full  training  set)  === 
===Summary  === 


Figure  2:  Parameter  Setting  of  Experiment 
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Table  2 


Correctly  Classified  Instances 

228 

61.126% 

Incorrectly  Classified  Instances 

145 

38.874% 

weka.gui.GenericObjectEditor 


weka.  classifiers .  trees.  J"4B 
About 

Class  for  generating  a  pruned  or  unpruneiJ  C4. 


bfrwySpfits    I  Truci 


debug  F-alse 


31 


minNumObj  :Z 


nmnFolds  3 


reduced  Err  or  Pruning  True 


~  | 


•s  a  v  e  Instan  e  eD.a  t  a  False 


seed  I 


subtree  Raising    |  True 


3 


unpruned  F-alse 


:  False 


- 


J  L 


Figure  3:  Generated  Decision  Tree  with  J48  Classifier 

CONCLUSIONS 

This  study  we  have  generated  decision  tree  model  which  is  shown  in  Figure  3.  We  can  easily  extract  if  then 

rules  from  decision  tree.  Our  aim  is  to  generate  some  valuable  if. ..then  rules  from  student  data.  These  rules  may  be  useful 
for  taking  decisions  to  improve  academic  performance  of  technical  college  student  data. 
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